Statistical Stylometrics and the Marlowe-Shakespeare Authorship Debate
نویسندگان
چکیده
I. Authorship Attribution Paradigm Historians, literary scholars, psychologists, and – more recently – computational linguists have long sought a reliable methodology for analyzing texts to determine the identity of their author. Since at least the late nineteenth century (see Mendenhall, 1887; Mascol, 1888a/b), one tool used in the investigation of authorship has been the extraction of statistical tendencies from the documents and comparison of these data in order to group the documents appropriately. Underlying the search for such a methodology is the critical assumption that some statistically quantifiable characteristic or set of characteristics inherent in a single author’s use of written language could be isolated that would be consistent across works by that author, but differ between different authors. Thus, the feature(s) could be used as a sort of fingerprint to distinguish between works by distinct authors and to identify the likely author of an anonymously published work. This endeavor is generally referred to as stylometry. With the advent of natural language processing and machine learning techniques in recent decades, the field has advanced greatly, and numerous researchers have addressed many specific
منابع مشابه
The Marlowe-Shakespeare Authorship Debate: Approaching an Old Problem with New Methods
1 " We must look for consistency. Where there is want of it we must suspect deception. " Sherlock Holmes in The Problem of Thor Bridge
متن کاملUsing Shakespeare's Sotto Voce to Determine True Identity From Text
Little is known of the private life of William Shakespeare, but he is famous for his collection of plays and poems, even though many of the works attributed to him were published anonymously. Determining the identity of Shakespeare has fascinated scholars for 400 years, and four significant figures in English literary history have been suggested as likely alternatives to Shakespeare for some di...
متن کاملSearching With Style: Authorship Attribution in Classic Literature
It is a truism of literature that certain authors have a highly recognizable style. The concept of style underlies the authorship attribution techniques that have been applied to tasks such as identifying which of several authors wrote a particular news article. In this paper, we explore whether the works of authors of classic literature can be correctly identified with either of two approaches...
متن کاملMeasuring style with the authorship ratio An invariant metric of lexical similarity
Stylometry is the study of the computational and mathematical properties of style. The aim of a stylometrist is to derive stylometrics and models based upon those metrics to quantitatively gauge stylistic propensities. This paper presents a method of formulating a stylistic distance function via a weighted ratio of lexical stylometrics, the higher the ratio the more the styles diverge. The coef...
متن کاملThe Stylometric Analysis of Faustus
The electronic corpus used for the Burwick and McKusick study is as follows: the anonymous 1821 Faustus, using the text prepared for the Oxford edition; five other translations of the play, by Germaine de Staël, George Soane, Daniel Boileau (this translation is called “the Boosey translation” in the stylometrics chapter), John Anster and Lord Francis Leveson-Gower, all again from the texts prep...
متن کامل